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Abstract. An integer n is (y, z)-semismooth if n = pm where m is an integer 
with all prime divisors < y and p is 1 or a prime < z. Large quantities of 
semismooth integers are utilized in modern integer factoring algorithms, such 
as the number field sieve, that incorporate the so-called large prime variant. 
Thus, it is useful for factoring practitioners to be able to estimate the value of 
ty(x, y, z), the number of (y, z)-semismooth integers up to x, so that they can 
better set algorithm parameters and minimize running times, which could be 
weeks or months on a cluster supercomputer. In this paper, we explore sev- 
eral algorithms to approximate ^(x, y, z) using a generalization of Buchstab's 
identity with numeric integration. 



1. Introduction 

The security of the public- key cryptosystem RSA 20, 17] is based on the practical 
difficulty of integer factoring. 

The fastest current general-purpose integer factoring algorithm is the number 
field sieve |161 |S], which in its basic form makes use of smooth numbers, integers 
with only small prime divisors. This has inspired research into algorithms to ap- 
proximately count smooth numbers [14l[22, 24, 25] [6l [18] . However, most imple- 
mentations of the number field sieve make use of the so-called large prime variant 
§6.1.4]. So we want, in fact, to count smooth integers that admit at most one 
slightly larger prime divisor, or semismooth numbers. (See, for example, the details 
on the factorization of a 768-bit RSA modulus [T3] where smoothness bounds are 
discussed near the end of §2.2.) 

The principal contribution of this paper is twofold: 

(1) We present data showing that the key to estimating ^(x, y, z) accurately 
is an algorithm to estimate ^(x,y) accurately, and 

(2) We present head-to-head comparisons of four algorithms for estimating 
ty{x,y,z). 

Previous work was done by Bach and Peralta [3] and generalized by Zhang and 
Ekkelkamp [27l[T0]; we discuss this below. 

This paper is organized as follows. We begin with some definitions, and briefly 
discuss computing exact counts of semismooth integers. We then give our main 
theoretical result, a generalized Buchstab identity, which together with numerical 
integration, is the basis of all our algorithms. We then present four different algo- 
rithms in some detail, one based on the Dickman p function and three based on 
the saddle point methods of Hildebrand and Tenenbaum [T3] , along with empirical 
results for each algorithm. As one might expect, we discover a tradeoff in algorithm 
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Table 1. Exact Values of ^(x, y, z), x = 2 



y 


z = 2 i0 


z = 2 12 


z = 2 i4 


z = 2 itJ 


z = 2 i8 


z = 2 M 


2' 1 


58916 


170906 


503392 


1500366 


4513650 


13596962 


2 4 


6132454 


15111450 


36766896 


88920834 


213965871 


508844667 


2 6 


323105012 


678707129 


1326493628 


2499496319 


4603776946 


8298667472 


2 s 


3157707079 


6694272918 


11837179134 


19296840890 


30059136386 


45290400397 


2 io 


7138986245 


21494669620 


39400743040 


61719198990 


89501569374 


123781662719 


2 12 




30641713551 


68600140477 


111769092210 


160884758713 


215725054639 


2 14 






80324574755 


145583683889 


214469637137 


286976436489 


2 16 








155283653287 


241316058768 


329067588881 


2 18 










248857736183 


349744885541 


2 20 












354982241411 



choice between speed and accuracy. We follow this up with an elaboration on some 
numerical details. 

2. Definitions 

Let P(n) denote the largest prime divisor of the integer n. An integer n is 
y-smooth if P(n) < y, and ^(x,y) counts the integers n < x that are y-smooth. 

An integer n is (y, z)-semismooth if we can write n = mp where m is y-smooth 
and p < z is a prime or 1. ^(x, y, z) counts the integers n < x that are (y,z)- 
semismooth. (Generalizations to more than one exceptional prime have been de- 
fined by Zhang and Ekkelkamp [27l HQ].) Observe that ^{x,y,y) = \&(x, y), the 
function ^{x, y,x) counts integers whose second-largest prime divisor is bounded 
by y, and *&(x, 1, z) = min{7r(x), tt(z)}, where tt(x) is the number of primes up to 
x. 

Our basic unit of work is the floating point operation. Along with the four basic 
arithmetic operations (+, — , X,-=-) we include square roots, logarithms, and expo- 
nentials, since their complexity is close to that of multiplication (see for example 

3. Exact Counts 

Using a prime number sieve, such as the sieve of Eratosthenes, we can completely 
factor all integers up to x in 0(x log log x) arithmetic operations, and thereby com- 
pute exact values of ^(x, y, z). Of course this is not a practical approach for large 
x, but it is useful for evaluating the accuracy of approximation algorithms. So we 
wrote a program to do this, based on a segmented sieve of Eratosthenes (see [23] 
for prime number sieve references). Our results appear in Table [TJ 

4. A Generalized Buchstab Identity 
We have the following version of Buchstab's identity (see for example [26l p. 365]): 

(1) V(x,y) = *(s,2)+ J2 ®( X /P>P)> 

2<p<y 
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which is obtained by summing over the largest prime divisor of y-smooth integers 
n < x. Using this same idea gives us the following: 

(2) ^x,y,z) = ^(x,y)+ ^ *( X /P>v)- 

y<p<z 

As one can see, the identity is obtained by summing over the largest prime divisor. 



5. Approximate Counts 

As mentioned in the Introduction, there are many algorithms to estimate values 
of ^{x,y). We could choose one of them, compute a list of primes up to z, and 
then apply $2$ to approximate ^(x, y, z). We found that this does, in fact, give 
fairly accurate estimates, but the resulting algorithms are quite slow since roughly 
0(z/logz) evaluations of ^(x/p,y) (one for each p, y < p < z) are needed. 

Our approach, then, is to replace the sum in ^ with a integral, and then use 
numeric integration to evaluate it [8j §7.2]; we used Simpson's rule. We found that in 
practice, the relative error introduced by replacing the sum with an integral that was 
then estimated, was less than the relative error introduced by the approximation 
algorithms for ^(x, y). 

Let us define li(ir) := J 2 dt/logt, and let e(x) :— ir(x) — \\{x). By the prime 
number theorem, e{x) — xj exp[f2(yTog~F)]; if we assume the Riemann Hypothesis, 
e(x) = 0{^\ogx) [IT]. 

We have the following (see [U §2.7]): 

Lemma 1. Let f be a continuously differ entiable function on an open interval 
containing [2, oo), and let 2 < y < z. Then 




+f(z)e(z)-f(y)e(y)- / e(t)f(t)dt. 

Of course we cannot apply this lemma to ^(x, y) directly, so we use an estimate 
instead. Define p(u) as the unique continuous solution to 

p(u) = 1 (0 < u < 1) 
p(u - 1) + up'(u) = (u > 1). 

Note that p £ C 1 for u > 1. Hildebrand [11] proved that for e > we have 

(3) ^y) = X p{u)(i + oJ^±^ 

uniformly on the set defined by 1 < u < exp((log y) 3 ^ 5_£ ) and y > 2. Here, 
u := u(x,y) = logs/ logy. 

Theorem 1. Given 2 < y < z < x, let e > 0, and assume that 1 < log(x/z)/ logy < 
log a;/ logy < exp((logy) 3 / 5 ~ c ). Then 

(4) *{x, y, z) = (*(x, y) + J' ^M^- d^j (1 + o(l)). 
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Proof. Define /(*) := (x/t)p(\og(x/i)/\ogy). By © we have f(p) = &(x/p,y)(l + 
o(l)) for all primes y < p < z. / is differcntiable and continuous, with f'(t) ~ 
—f(t)/t. It is then straightforward to show that e(t)f'(t) = o(f(t)/ logi). Also 
since / is decreasing we have e(z)f(z) — e(y)f(y) < (e(z) — e(y))f(z) — o(ir(z) — 
ir(y))f(z) = o(^2 < <z f(p))- We then apply Lemma Q] and use (|2|), and finally 
substitute ^(x/t,y) back in for f(t) to complete the proof. □ 

We also investigated a third approach. It would be nice to have some function 
g(y, z) where 

(5) f(x,y,z)^V(x,y)-g(y,z) 
if this is possible. Rewriting ([2]) we have 



(6) y(x,y,z)=V(x,y)- 1 + 



y 

y<p<z y ,yi 



A very crude estimate of ^(x/p, y)/ 1 i'(x, y) « l/p leads to 

(7) *(x,y,z) w *(x,y) ■ (l + log(logz/logy))). 

In practice, this is too crude to be useful. However, this estimate can certainly 
be improved using, for example, Theorem 11 from |26l §5.5]. This is a possible 
direction for future work. 

5.1. The Method of Bach and Peralta. The first algorithm to try would be to 
use the estimate ^f(x,y) ~ xp(u) from ([3]) and plug it into Theorem [T] This, in 
fact, is simply another way to derive the algorithm of Bach and Peralta [3]. They 
define u := log x/ logy, v := log xj log z and 



(8) a(u, v) := p(u) + / (p(u — u/w)/w)dw. 

J v 

They then prove that for fixed u, v and x — > 00, 

(9) ^{x, y, z) w xa(u, v). 

We can use (J2J) to obtain the same approximation, as follows: 

*(ar,y,z) « V(%,y)+ f {^{x/t,y)/\ogt)dt 

w x ■ p(\ogx/\ogy) 

+ f (x/t\ogt)p{\og{x/t)/\ogy)dt 
Jy 



p(u) + / (p(u — u/w)/w)dw 

\ Jv 

= x ■ a(u, v). 

We adapted code written by Peralta to compute values of the Dickman p function 
(accurate to roughly 8 decimal digits) to compute a. Bach and Peralta discuss 
methods to compute both p and a in some detail in [3j. We present the results 
from this algorithm in Table [2] As for all our algorithms, we show the ratio of what 
the algorithm produces as an estimate divided by the actual values we computed 
earlier. The closer we are to 1 the better the estimates. 
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Table 2. x ■ a(u, v)/$t(x, y, z), x = 2 



y 


z = 2 il) 


z = 2 iy 


z = 2 i4 


z = 2 w 


z = 2 is 


z = 2*> 


2' 2 


7.1e-14 


1.3e-12 


2.2c-ll 


3.4e-10 


4.9e-09 


6.2e-08 


2 4 


0.0042182 


0.007326 


0.012701 


0.021657 


0.035951 


0.058 


2 6 


0.26627 


0.2956 


0.32972 


0.36872 


0.41167 


0.45851 


2 8 


0.64392 


0.66898 


0.68836 


0.70731 


0.72686 


0.74754 


2 io 


0.75636 


0.80963 


0.82644 


0.83778 


0.84679 


0.85628 


2 12 




0.84863 


0.87886 


0.89096 


0.89979 


0.90729 


2 14 






0.89495 


0.9169 


0.92614 


0.93196 


2 16 








0.92275 


0.93539 


0.9421 


2 18 










0.93769 


0.94722 


2 20 












0.95044 



This algorithm is fast, but the results are not as good as we might desire. As 
one might expect, they are about as good as what was found for estimating counts 
of smooth numbers using ([3]) in |14j . 

Ekkelkamp [10] pointed out that <r(u, v) could be made more accurate by adding 
a quantity which, in our notation, is 



(1 - j)x 



P(u-1)+I P{u - U,W - l) d W 
' to — 1 



log a 

This can be derived by using the better approximation 

V(x, y) « xp(u) + Q-—^ p(u - 1), 
logx 

due to Ramaswami [191 Theorem 1]. To get her formula, substitute w = 1/A in the 
integral; note that we need the first term inside the brackets, since our definition 
of semismoothness differs from hers. (Our "large prime" can be 1.) This correc- 
tion does not require much additional effort; essentially just one more numerical 
integration. 

Our experiments with this indicate that the additional term enhances accuracy 
significantly when y and z are large. This is roughly the bottom corner of Table [2] 
However, the accuracy still drops off dramatically for smaller values of y. 



5.2. A Saddlepoint Method. Our second algorithm is based on Algorithm HT 
for estimating ^(x,y) presented in |14j . Define 

C(s,y) := JI(l-p-)- 1 ; 

p<y 

<f>(s,y) := logC(s,y); 
d k 
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The functions 4>k can be expressed as sums over primes. Indeed, we have 

4>{s,y) = -^log(l-p- s ); 



p<y 



p s (\ogp) 



2 



(pa _ 1} 2 • 

p<y 

Thus, with a list of primes up to y, the quantities £(s, y), <fii(s, y), and </>2(s, y) can 
be computed in 0(y / logy) floating point operations. 
Define 

HT(x,y,s) - 



s^/2-K(j)2{s,y) 

and let a be the unique solution to 4>\ (a, y) +log x = 0. Hildebrand and Tenenbaum 
proved the following [12] : 

Theorem 2. 

(10) = I?T(x,i/, a)- fl + ofi + tel 

uniformly for 2 < y < x. 

This gives us Algorithm HT [14] : 

(1) Find the primes up to y. 

(2) Compute an approximation a' to a using binary search and Newton's 
method. Make sure that \a — a!\ = 0(1/ (ulogx)). 

(3) Output HT(x,y,a'). 

Write HT(x, y) for the value output in the last step. The running time is O ( v 
floating point operations. 

We simply plugged Algorithm HT into ^ to estimate ^(x, y, z) using the saddle 
point method as follows: 

(11) HT(x,y,z):=HT(x,y)+ f (HT(x/t,y)/t)dt. 

In Table |3] we give the results for this algorithm, which are quite good. The method 
is, however, a bit slow. 

Using the summation algorithms described in [2] , we can lower the exponent of y 
in the running time from 1 to 2/3. We give some details for this in §6.31 We did not 
implement this improvement, however, because it would not change the computed 
results, and the method of the section below is faster. 

5.3. Assuming Riemann's Hypothesis. This is the same as Algorithm HT, 
only sums over primes (£, 4>i,4>2) above roughly ^Jy are estimated using the prime 
number theorem plus the Riemann Hypothesis [H] ■ It is much faster than Al- 
gorithm HT and nearly as accurate; its running time is roughly y/y. Let HTf(x, y) 
denote the estimate this algorithm computes for ty(x,y), and HTf(x,y, z) the esti- 
mate after using HTf(x,y) with {T]). We present our results in Table H] They are 
nearly as good as those from Algorithm HT, and the method is much faster. 



y log log x , y 

y ~ log log y 
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Table 3. HT(x,y,z)/^(x,y,z), x = 2 



y 


z = 2 iu 


z = 2 12 


z = 2 i4 


z = 2 i(j 


z = 2 4 « 


z = 2 M 


2' 2 


1.0969 


1.0712 


1.0641 


1.067 


1.0728 


1.0833 


2 4 


1.0592 


1.039 


1.031 


1.0353 


1.0507 


1.083 


2 (i 


1.0414 


1.0281 


1.0205 


1.024 


1.0566 


1.1582 


2 s 


1.0194 


1.0165 


1.0135 


1.0127 


1.0296 


1.1338 


2 10 


1.0024 


1.0104 


1.0101 


1.01 


1.0112 


1.0437 


2 12 




1.0043 


1.0104 


1.01 


1.0078 


1.0107 


2 14 






1.0046 


1.0084 


1.0115 


1.0143 


2 16 








1.0101 


1.016 


1.017 


2 18 










1.0142 


1.0116 


2 20 












1.0083 




Table 4. HT f (x,y,z 


)/^{x,y,z, 


, x = 2 




y 


z = 2 Lti 


z = 2^ 


z = 2 i4 


z = 2^ 


z = 2 i8 


z = 2 a) 


2^ 


1.0969 


1.0712 


1.0641 


1.067 


1.0728 


1.0833 


2 4 


1.0592 


1.039 


1.031 


1.0353 


1.0507 


1.083 


2 6 


1.0414 


1.0281 


1.0205 


1.024 


1.0566 


1.1582 


2 8 


1.0194 


1.0165 


1.0135 


1.0127 


1.0296 


1.1338 


2 io 


0.99773 


1.0069 


1.007 


1.0073 


1.0088 


1.0413 


2 12 




1.0152 


1.0186 


1.0171 


1.0141 


1.0164 


2 14 






1.0177 


1.0186 


1.0203 


1.0222 


2 16 








1.021 


1.0247 


1.0246 


2 18 










1.021 


1.0172 


2 20 












1.013 



We recommend this method. 

5.4. Suzuki's Algorithm. In successive papers, Suzuki [25] develops a very 
fast algorithm, with cost 0(y/\og x log y) operations, using the saddle point method 
to estimate ^(x,y). This is based on good approximations for a and the prime 
sums C, 4>i, 4*2 using the prime number theorem. 

For u > 1, let £ be the positive solution to the equation e ? = 1 + u£, or equiva- 
lently, £ = log(l + This last equation implies that £ ps log(ulogu), and can be 
used iteratively to evaluate £. (See §6.21 for more information on this point.) 

Let 7 = 0.57721... be Euler's constant. We now define 

(12) a s : = 1 - -i- 

logy 

and 

T a spl +^t- 1 (e t -l)dt 

(13) S(x,y):= 



a s y/2nu(l + (\ogx)/y) 
Suzuki proves the following I25[ Theorem 1.1]: 

Theorem 3. Let e < 1/2. If (loglogx) 5/3 " £ < logy < e _1 (l - e)loga; 7 the 
(14) *(x,y)=S(x,y)(l + o(l)). 
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Table 5. S(x, y, z)/^(x, y, z), x 



_ o40 



y 


z = 2 i() 


z = 2 12 


z = 2 i4 


z = 2 itJ 


z = 2^ 


z = 2' M 


2' 2 




















2 4 

















2.3033 


2 6 


0.57462 


0.60753 


0.64352 


0.68488 


0.7388 


0.82406 


2 s 


0.90295 


0.90963 


0.91195 


0.91382 


0.92929 


1.0202 


2 io 


0.94404 


0.93337 


0.92495 


0.9175 


0.9108 


0.93201 


2 12 




0.93348 


0.90717 


0.89212 


0.87883 


0.87083 


2 14 






0.90403 


0.87455 


0.85967 


0.84873 


2 16 








0.88064 


0.85461 


0.83659 


2 18 










0.85917 


0.82827 


2 20 












0.83353 



Suzuki proposed using the midpoint method to evaluate the integral L t 1 (e* — 
l)dt. We used its Maclaurin series J2 n >i £"/ ( n ' n ') 03 formulas 5.1.10 and 5.1.40]. 
We write S(x, y, z) for the function to estimate ^(x, y, z) using S(x, y) to estimate 
^(x, y) in Theorem[TJ 

Our results are presented in Table [5] We found the algorithm to be extremely 
fast, but not accurate for small y, which is not surprising given the approximations 
used. 

Earlier, in |24j . Suzuki discussed HT(ir,y, a s ) as an approximation to ^(x,y). 
Although this is faster than Algorithm HT by a factor of log logs, it is not as 
accurate, and so we chose not test its use. 

6. Numerical Details 

We elaborate on some details from the algorithms presented above. 

6.1. Estimates for a. In this subsection, we give more information about the 
function a{x, y), defined implicitly by the equation 4>i{a, y) + logx = 0. In partic- 
ular we will prove that 

(15) — — <a<2. 
y ' 2 log x ~ ~ 

We prove the lower bound first. From the prime number sum for (f>i, we see that 

-4* > 

So a, the solution to 4>i(a,y) + log a; = 0, is lower bounded by the solution /? to 

log 2 



2$ - 1 

Solving, we get 



logx. 



a>p= log(l + (log2)/(logq : )) > 1 



log 2 — 21ogx' 

the last inequality holding whenever x > 2. 

Next we show the upper bound. Let £ denote the Riemann zeta function. By 
examining their Dirichlet series, we can see that —(f>i(s,y) < — C'/C( s )- Both sides 
are decreasing smooth functions of s on (0,oo). It follows that a is upper bounded 
by the solution to C'(s)/C( s ) = l°g2, which is less than 2. 
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More precise information can be found in [12] , In particular, 




} 



holds uniformly for x > y > 2, with the explicit lower bound 



log(l + y/(51ogx)) 



a > — r^— 

logy 



The strength of this is similar to (TT5)) if y is fixed and x — > oo . 

6.2. Computing £. Here we discuss some numerical methods for solving = 
1 + u£, when u > 1. 

Let f(x) = x — log(l + hi). Then / is convex on (0,oo) with a minimum at 
x = 1 — l/u. This gives the lower bound £ > 1 — l/u+ log u. To get an upper bound, 
we observe that e x > 1 + x + x 2 /2, so £ is no larger than the positive solution to 
l+£ + £ 2 / 2 = l + which is 2(u-l). Using binary search between these bounds, 
we can get |_£J plus d bits of its fraction, with 0(logu + d) evaluations of /. 

Starting from the defining equation and taking logarithms, we get 



Suzuki [Ml Lemma 2.2] proves that this iteration, starting from log it, is linearly 
convergent to £. (Here u > e is fixed.) 

In practice, we can use the Newton iteration 



starting with the upper bound 2{u — 1). By convexity, the iterates decrease toward 
the root. We tested values of u from 2 to 1000, and for these u, about 5 iterations 
were enough to get machine accuracy (about 15D). (Note that when x is large, 
f{x) ~ x, so even if we start with a large upper bound, the second iterate will be 
much closer to the root.) 

Newton iteration does not work well when u is close to 1 , for the following reason. 
Suppose that u = 1 + e. Then, we have e + 0(e 2 ) < £ < 2e, making £ — log(l + w£) 
vanish to first order in e. Thus, when computing this factor, we will lose precision 
due to cancellation. 

If special function software is available, £ can be expressed using the Lambert W 
function. For example, in the notation of a well known Canadian computer algebra 
system, £ = — l/u — Lambert W(— 1, — e~ 1 /"/u). (The argument —1 indicates which 
branch should be used.) 

6.3. ATM Summation. The purpose of the next two subsection is to justify the 
remark made earlier that the cost of evaluating the formula HT can be lowered to 
y2/3+o(i)_ jj ere] the "o(l)" term includes factors of order log a;, so we are implicitly 
assuming that x is not outrageously large. 

If / is a function defined on the positive integers, it is multiplicative if f(mri) — 
f(jn)f(n). This is a stronger requirement than is usual in number theory, where 
m and n need only be coprime. The concept of an additive function is defined 
similarly. We will call / an ATM function (additive times multiplicative) if / = gh, 
where g is additive and h is multiplicative. 



£ = log«+l). 



(g-log(l + <))(!+<) 
1 + u(£-l) 
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The paper [2] gave an algorithm that evaluates the prime sum J2 P < y fip)i with 
/ an ATM function, in y 2 /3+o(i) steps. This generalized a previously known result, 
also explained in [2], in which / could be multiplicative. 

We now explain how these summation algorithms can be used to evaluate cj>i and 
log£. The basic idea is to approximate each of these by a "small" number of ATM 
or multiplicative prime sums. With log C in hand, we can exponentiate to get £. 

We first assume s > 0. 

Let us consider </>i first. By summing geometric series, we see that 

x - \ogp \ - \ - logp 



EE 



pS _ I Z-^/ Z-^i pks 

p<y y k>lp<y y 

Note that each inner sum involves an ATM function. We will restrict the outer sum 
to 1 < k < N, and choose N to make the truncation error, 



logp 



k>N+l p<y P 

small. 

If we interchange the order of summation, allow all p > 2, and sum geometric 
series, we can express the truncation error as 

logp 



Using the globally convergent Maclaurin series for p s , we see that l/(p s — 1) < 
l/(slogp). If we plug this in, the logp factors cancel and we get the upper bound 



1^1 1/1 f°° dt 

< 



s A^ p N S - s \ 2 Ns J 2 t Ns J s2 Ns 

provided that Ns > 2. For us, s > 1/(2 logx), and with this additional assumption 
we get 

r • i 6 logx 

[truncation error] < ^ Ns . 

Thus, to achieve truncation error less than 2~ d , we can use N — 0((log x)(log log x+ 
d))._ 

Similarly, we can use 

log cm = - E ^ - p~ s ) = E ^ E i ■ 

p<y k>l p<y 

Now each inner sum involves a multiplicative function. If we use only the inner sums 
with k < N, a similar analysis shows that choosing N = 0((log x)(log log x + G0) 
will keep the truncation error below 2~ d . 

6.4. Numerical Differentiation. In this subsection, we explain how to evaluate 

<P2(s,y) - x — 



p<y 



for use in Algorithm HT. There is no obvious way to reduce this to the kind of 
sums treated in [2], so we will use numerical differentiation. 
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We will need prime number sums for the derivatives of 0. Define polynomials 
fk € 2[X] for k > 1 by the recurrence relation 

f k+1 (X) = (X 2 + X)f k (X) - kXf k (X), f 1 (X) = -l. 

Using induction on fc, we can show that 

M*,v) = -£ J ( v s_ 1]k+ i ■ 

p<y W > 

In particular, 

^ log 4 p • (p 3s + Ap 2s + p s ) 

Ms, y) = ^ / « _ 1)4 • 

p<y 

Using balanced numerical differentiation [51 p. 297], we have 
, / x <l>i{s + h,y)-<j> 1 {s~h,y) ft 2 

<t>2{s,y) = + e ' e = y<Mf?,y) 

for some rj G [s — ft, s + ft]. Let us determine how much precision will be necessary 
to deliver d bits of 02 (s, y) accurately, when we use this formula. 
We have 

^ p s log 2 p {p 2s + 4p s + 1) log 2 p 
^ V) = X. 7-731)2 >< 7^7-1)2 ■ 

Observing that (t 2 -\- 4t + I) / (t — I) 2 is decreasing for t > 1, we get the estimate 
ft 2 ft 2 log 2 y f 2 2a +4-2 a + l ^ 

— 04(S,2/J < 77T- TTn 4>2(S,y). 



6^ v ' ay - 6 V (2 s - 1 ) 
If < s < 2, the factor in parentheses is bounded by 15/s 2 . (Numerically, anyway.) 
So, if we could use exact arithmetic, numerical differentiation would give us 

5 h 2 log 2 y 

[relative error] < - - < 10ft 2 log 2 x log 2 y. 

2 s 2 

However, we don't have exact arithmetic, so we must also analyze the loss of 
precision due to cancellation. For this, we use an ad hoc theory. When ft is small, 
the number of bits lost, when using the balanced difference formula to compute 
4>[(s), is about 

0i(s + h) — 0i (s — ft) 



log 2 



0i(s) 

since dividing by h causes no loss of precision. Note that this is a centered version 
of the usual relative error formula. Since 0i(s + h) — 0i(s — ft) ~ 2/j0i(s), we must 
bound the logarithmic derivative of 0i, or, what is the same thing, relate 02 = 4>'\ 

to 01. 

In our case, we require a lower bound for 02. Then since t/ (t — 1) is decreasing 
on (1, oo), 

v y ,y ' ^ p s - 1 p a - 1 

p<v F F 

^E7^y lo gp>-^( s ^) lo g 2 - 

p<y 

Therefore, 

0i(s + h,y)- 0i (s -h,y)~ 2ft0 2 (s,y) > -0i(s,y) • 2ftlog2, 
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as h — » 0, 

Let us now translate these results into practical advice. Suppose our goal is 
to obtain fa to d bits of precision, in the sense of relative error. By the exact 
arithmetic formula, we should choose h < 2" d / 2 /(\/T0 log 2 a;). Then we need to use 
log 2 hr 1 + 0(1) guard bits in our computation. Put more crudely, unless x is very 
large, doubling the working precision should be enough, if we select h properly. 

The following example indicates that the theory above is roughly correct. Sup- 
pose we want 10 digits of fa, for x = 10 6 , y = 10 3 , and s = 1/(2 log a;) = 0.03619... 
. Our recipe allows us to take h = 10~ 7 . With 17 digit arithmetic, we obtained 

numerical derivative = 127790.77386350000 
summation for fa = 127790.77386041727 

which agree to 11 figures. 
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